PPI Product Cross Sell

by: Pratik Sharma


A consumer bank with a range of products would like to cross-sell insurance to its consumer base (that is, cross-sell the personal protection insurance (PPI) product to those customers who have a secured or unsecured type of loan, but no PPI product as yet). Data is provided for their customer portfolio containing various fields about their product ownership, credit standing, outstanding amounts, and whether they already have an insurance product (called as PPI / personal protection insurance), if any, the type of PPI product they own.

The bank would like to adopt analytics driven approach applied on this sample data for deciding:

Exploratory Data Analysis - Categorical Columns


Including data cleaning, univariate, bivariate, multi-variate analysis. Findings from exploratory data analysis:

Feature Reduction


Feature reduction based on the pearson pairwise correlation matrix and then correlation of the variable with the PPI. Where there were multi-colinear features, their correlation with PPI variable was checked and the one with lower correlation was dropped from the analysis.

Feature Selection


Weight of Evidence and Information Value

According to www.listendata.com, Weight of Evidence is:

The weight of evidence tells the predictive power of an independent variable in relation to the dependent variable. Since it evolved from credit scoring world, it is generally described as a measure of the separation of good and bad customers. “Bad Customers” refers to the customers who defaulted on a loan. and “Good Customers” refers to the customers who paid back loan.

According to the same source, Information Value is:

Information value is one of the most useful technique to select important variables in a predictive model. It helps to rank variables on the basis of their importance.

Information Value Predictive Power
<0.02 Useless for Prediction
0.02-0.1 Weak Predictor
0.1-0.3 Medium Predictor
0.3-0.5 Strong Predictor
>0.5 Suspicious or too good to be true

As mentioned here:

Exploratory Data Analysis - Numerical Columns


Predicting the `Insurance_Description` for the Customer with PPI = 0


Steps performed include:

  1. Remove binned variables, object columns. Also drop PPI, prdt_desc_le, category_le variables since they are almost similar to our target variable i.e. Insurance_Description.
  2. Check correlation of Insurance_Description with independent features in the dataframe
  3. Create train, validation and test set with Insurance_Description being the target variable
  4. Evaluate different classifier models - Decision Tree and Random Forest Classifier were among the top two classifier models
  5. Predict the Insurance_Description using CatBoost Classifier with a recall and F1-score of 87%

Cross-Selling Insurance Products Using Market Basket Analysis


Assuming, predicted products (Insurance_Description) are the products customers currently holding, using market basket analysis to explore the products customers tend to buy together, and use that information to cross-sell other insurance products. Association Rule Mining is used when we want to find an association between different objects in a set, find frequent patterns in a transaction database, relational databases or any other information repository. Some of the applications of market basket analysis include: recommendation engine, cross-sell / bundle products, arranging items in the retail stores, credit card purchases of customers to build profiles for fraud detection purposes and cross-selling opportunities, telecom marketing efforts at customers, etc.

Cross-selling means encouraging a customer who buy product to buy a related or complementary product, with a view to expand banking business, reduce the per customer cost of operation and provide more satisfaction and value to the customer.

Apriori Algorithm & Matrices

“Frequently Bought Together” → Association

“Customers who bought this item also bought” → Recommendation

Source: KD Nuggets & Research Paper